Evaluating Network Models: A Likelihood Analysis 
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Many models are put forward to mimic the evolution of real networked systems. A well-accepted 
way to judge the validity is to compare the modeling results with real networks subject to several 
structural features. Even for a specific real network, we cannot fairly evaluate the goodness of 
different models since there are too many structural features while there is no criterion to select 
and assign weights on them. Motivated by the studies on link prediction algorithms, we propose a 
unified method to evaluate the network models via the comparison of the likelihoods of the currently 
observed network driven by different models, with an assumption that the higher the likelihood is, 
the better the model is. We test our method on the real Internet at the Autonomous System (AS) 
level, and the results suggest that the Generalized Linear Preferential (GLP) model outperforms 
the Tel Aviv Network Generator (Tang), while both two models are better than the Barabasi- 
Albert (BA) and Erdos-Renyi (ER) models. Our method can be further applied in determining the 
optimal values of parameters that correspond to the maximal likelihood. Experiment indicates that 
the parameters obtained by our method can better capture the characters of newly-added nodes 
and links in the AS-level Internet than the original methods in the literature. 
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I. INTRODUCTION 



Recent years have witnessed a fast development of 
complex networks [1-4]. A network is a set of items 
that are called vertices with connections between them, 
which arc named as edges. Many natural and man-made 
systems can be described as networks. Such paragons 
cannot be numbered that biological networks including 
protein-protein interaction networks [5] and metabolic 
network [6]; social networks such as movie actor col- 
laboration [7] and scientific collaboration networks [8]; 
technological networks like power grids [9], WWW [10] 
and the Internet at the Autonomous System (AS) level 
[11-16]. A major endeavor in academics is to discover 
the common properties shared by many real networks 
and the specific features owned by a certain type of net- 
works. A great number of measurements to reveal the 
structural features of networks are applied [17]. The de- 
gree distribution [18], as one of the most important global 
measurements, has attracted increasing attention since 
the awareness of the scale- freeness [19]. Clustering coef- 
ficient is a local measurement that characterizes the loop 
structure of order three. Another significant measure- 
ment is the average distance. A network is considered 
to be small- world if it has large clustering coefficient but 
short average distance [9] . Except for the properties men- 
tioned above, there are many other measurements such 
as degree-degree correlation [20], betwecnness centrality 
[21] and so forth. Moreover, some statistical measure- 
ments borrowed from physics such as entropy [22], and 
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novel metrics such as modularity [23] also play important 
roles in characterizing networks. 

Not only the statistical features but also the dynam- 
ical evolution of networks the current research interest 
has focused on. A mess of models have been proposed to 
reveal the origins of the impressive statistical features of 
complex networks. There arc also many evolving models 
developed for some certain type of networks such as the 
Internet at the AS level [11-16], the social networks [24- 
29] and so forth. However the prosperous development 
of measurements sets a barrier for evaluating different 
evolving models. The traditional idea is that: if the net- 
work generated by a model resembles the target network 
in terms of some statistical features usually selected by 
the authors themselves, the model is claimed as a proper 
description of the real evolution. But this methodology 
seems to be puzzling. First, unselected statistical prop- 
erties are entirely ignored so no one knows whether the 
model is sufficient to describe them as well. Secondly, the 
authors tend to select the metrics that support their mod- 
els. Therefore, it is impossible to give a fair remark that 
which model is better. Thirdly, it is difficult to quantify 
the extent to which the models resemble the real evolving 
mechanisms. 

Inspired by the link prediction approaches and likeli- 
hood analysis, we propose a method that tries to fairly 
and objectively evaluate different models. Link predic- 
tion aims at estimating the likelihood of non-existing 
edges in a network and try to dig out the missing edges 
[30] . The evolution of networks involves two processes - 
one is the addition or deletion of nodes and another one 
is the changing of edges between nodes [28] . In principle 
the rules of the additions of edges of a model can be con- 
sidered as a kind of link prediction algorithm and here lies 
the bridge between link prediction and the mechanism of 
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evolving models. 

The present paper is organized as follows. We will 
give a general description of our method in Section II. 
Section III introduces the data and explains how to use 
our method to evaluate evolving models in details with 
the AS-level Internet being an example network. The 
results obtained by our method are shown in Section IV. 
We draw the conclusion and give some discussion in the 
last section. 



II. METHOD 



TABLE I: The number of nodes and edges of the three data 
sets: two real data sets and one data set that is processed as 
we describe in the paper. 



Time 


# Nodes 


# Edges 


2006.06 


22960 


49545 


2006.12 


24403 


52826 


2006.12 (processed) 


25103 


59268 



giving higher likelihood according to the target network 
is more favored. 



In this section, we will give a general description about 
our method to evaluate evolving models. It is believed 
that an evolving model is a description of the evolving 
process of a network in reality. An evolving model de- 
scribes the evolving mechanism of a real network or a 
class of networks. Given two snaps of one network at 
time ti and ^2 (^i < ^2): as well as an evolving model, we 
can in principle calculate the likelihood that the network 
starting from the configuration at time ti will evolves to 
the configuration at ^2 under the rules of the given model. 
We say a model is better than another one if the likeli- 
hood of the former model is greater than that of the latter 
one. However, how to calculate such likelihood is still a 
big challenge. Inspired by the like prediction algorithms, 
we can calculate the likelihood of the addition of an edge 
according to a given evolving model [30]. In a short du- 
ration of time, each edge's generation can be thought as 
independent to others and the sequence of generations 
can be ignored. Thus the likelihood mentioned above is 
the product of the newly generated edges' likelihoods. 

Denote by G the network and Et the set of edges at 
time step t. The new edges generated at the current time 
step is Enew = Et+i\Et. The probability that node i is 
selected as one end of the newly generated edge is 



n, =/(G,a), 



(1) 



where a is the set of parameters applied by the model. 
Then the likelihood of a new monitored edge is 



n, X n. 



(2) 



Eq. (2) is applicable only when i and j are both old 
nodes. If i or j is newly generated, we set 11^ = 1 or 
Ilj = 1. In order to make comparison between differ- 
ent models, P{i,j) is normalized by l/J2{a b)e-E" ^'('^' 
where E^ is the set of nonexisting edges((i,j) G E'^). 
Given different parameters a, the values of P{i,j) may 
be different, resulting in different likelihoods of the target 
network. The parameters corresponding to the maximum 
likelihood are intuitively considered to be the optimal set 
of parameters for the evaluated model. In a word, a net- 
work's likelihood can be calculated if the evolution data 
and the corresponding model are given. And if there are 
several candidate models, our method could judge them 
by comparing the corresponding likelihoods: the model 



III. EXPERIMENTAL ANALYSIS 

In this paper we focus on the models of the AS-level In- 
ternet. Two popular models - Generalized Linear Prefer- 
ential model (GLP) [11] and Tel Aviv Network Generator 
(Tang) [15] - will be evaluated by our method. The well- 
known Barabasi- Albert (BA) [19] and Erdos-Renyi (ER) 
[31, 32] models are also analyzed as two benchmarks. 

The data sets we utilize here are collected by the Route- 
views Project [33] . We use the data of Jun. 2006 and Dec. 
2006. Some nodes and edges in Jun. 2006 disappear in 
the record of Dec. 2006. Although an autonomous sys- 
tem might be canceled, rarely does it happen during a 
short time span. Therefore we assume that the nodes 
and edges in Jun. 2006 will not disappear in Dec. 2006. 
That is to say that the network configuration in Jun. 
2006 is a subgraph of that in Dec. 2006. We merge the 
network of Jun. 2006 into that of Dec. 2006 to make a 
set substraction between the two sets to obtain the newly 
generated edges and nodes. The basic information of the 
processed data set of Dec. 2006 and two original data 
sets is shown in Table I. 

Now we will describe how to calculate the likelihood of 
each newly-generated edge in terms of the four models, 
(i) GLP model - This model starts from a few nodes. 
At each time step, with the probability 1 — p, one new 
node is added and m edges are generated between the 
new node and m old ones and with the probability p, m 
edges are generated among the existing nodes. The ends 
of new edges are selected following the rule of generalized 
linear preferential attachment as 



n, 



(3) 



in which /3 G (—00, 1). In our method if the end i of a 
new edge is selected among the existing nodes, then Ilj 
is calculated by the Eq. (3). Otherwise, if the end i itself 
is a new node, 11^ is 1. So the likelihood of a new edge 
connecting two existing nodes a and b is 



ka- P h- /3 



(4) 



The likelihood of an edge generated between a new node 
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FIG. 1: Likelihoods for different models and different parameters. 



b and an existing node a is 



ka-fi 



(5) 



When a new edge connects two new nodes a and 5, its 
likelihood is 



Pi 



(a,fc) 



= 1. 



(6) 



(ii) Tang model - This model applies a super linear 
preferential mechanism, say 



k 



l+e 



l+e ■ 



(7) 



This model also starts with a few nodes and at each time 
step a new node is generated with one edge connecting to 
one of the existing nodes that is selected with the prob- 
ability described in Eq. (7). The remaining m — 1 edges 
are added between the existing nodes. For these m — 1 
nodes, one end is selected according to Eq. (7), while the 
other one is selected randomly. Hence the likelihood of a 
new edge between existing nodes is 



(a.b) 




(8) 



where N is the current size of the monitored network. 
Eq. (8) takes a geometric mean due to the fact that 
either a or 6 could be the one selected randomly. The 
cases involving new nodes are managed in the same way 
as that for the GLP model, (iii) BA model - The BA 
model also starts from a small graph and at each time 
step a new node associated with m edges is added. The 
probability that the existing node i is selected is 



n, 



(9) 



Note that the original BA model cannot deal with the sit- 
uation where edges are generated between two existing 
nodes. We thus generalize the BA model as if one edge 
is generated between two existing nodes, one node is se- 
lected preferentially following the Eq. (9) and another 
one is selected randomly. Therefore the likelihood of an 
edge between two existing nodes a and b is calculated as 



ka 



(10) 



The likelihood of an edge connecting a new node b and 
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The Average Degrees of New Nodes 




( 10"^ "^^^ Density of Interaction among the New Nodes 

Real Internet: 1.91708x10" 



o. 1 - 



b 



GLP{0.230) GLP(0.616) Tang(0.025) Tang(0.200) 



GLP(0.230) GLP(0.616) Tang(0.025) Tang(0.200) 



Fraciton of Leaves among the New Nodes 



0.94- 
0.92 
0.9 



0.88 



Real Internet: 0.516099 



GLP(0.230) GLP(0.616) Tang(0.025) Tang(0.200) 



FIG. 2: (a) The average degree of the newly generated nodes; (b) The density among the newly generated nodes; (c) The 
fraction of leaves in the newly generated nodes. Dash line in each plot represents the values for the real Internet. The 
structural features corresponding to the networks obtained by our suggesting parameters are closer to the reality. For each 
model with each parameter, we generate 100 networks and use the so-called box-and-whisker plot [34] to display the results, 
where the horizontal lines from top to bottom respectively stand for the maximum, the upper quartile, the median, the lower 
quartile and the minimum of a set of data. 



an old one a is 



The likelihood of a new edge generated between two new 
nodes is 1 as discussed above, (iv) ER model - The 
mechanism of this model is that when one edge is gener- 
ated, both its ends are selected in a random fashion. The 
likelihood of one edge (a, b) between two old nodes is 

PiaM = (12) 

The calculation of other two types of edges is similar to 
that of GLP. Note that BA is a special case equivalent 
to the GLP model when P ~ 0. It is also obvious that 
the ER model is a special case of the Tang model when 
e = 0. 

The likelihoods of the four evolving models with dif- 
ferent parameters are shown in Figure 1. The maximum 
likelihoods as well as the corresponding parameters are 
listed in Table II. The maximum likelihoods of both spe- 
cific Internet models (GLP and Tang) are greater than 



TABLE II: Maximum likelihoods and the corresponding pa- 
rameters for the four models. 



Model 


Maximum Likelihood 


Optimum parameters 


GLP 


3.54 X lo-i^tww' 


0.230 


Tang 


9.77 X 10-124442 


0.025 


ER 


4.17 X 10-"2356 


N/A 


BA 


2.26 X 10-124449 


N/A 



those of the BA model and the ER model. Notice that 
the BA and ER model are parameter-free and thus rep- 
resented by two straight lines in Figure 1. Our results 
suggest that subject to the mimicking of the AS-level In- 
ternet evolution, the GLP model is better than the Tang 
model, and the Tang model is better than the BA model, 
of course, the ER model performs the worst. A puzzling 
point is that the optimal parameters corresponding to 
the maximum likelihoods are far from the ones suggested 
in the original literature [11, 15]. We next devise an ex- 
periment to demonstrate that the parameters obtained 
by our method arc more advantageous than the original 
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ones. 

Traditionally, an evolving model starts from a small 
network with a few nodes. In this experiment, we re- 
spectively use the GLP and Tang models to drive the 
network evolution starting from the configuration of Jun. 
2006, ending with the same size of the configuration of 
Dec. 2006. According to the Refs. [11, 15] and the data, 
/3 = 0.616, m = 1.13, p = 0.5214 and e = 0.2. Then 
we analyze some statistical features of the newly gener- 
ated part including the average degree, the density of 
interaction and the fraction of leaves. We find that the 
performance of the GLP model is better than the Tang 
model with the same kind of parameters in the three 
cases, demonstrating that our evaluating method is rea- 
sonable. For both the two models, the statistical features 
obtained by the optimum parameters suggested by us re- 
semble the real data better than those obtained by using 
the original parameters. The comparisons are shown in 
Figure 2. 

IV. CONCLUSION AND DISCUSSION 

Thousands of network models are put forward in recent 
ten years. Some of them aim at uncovering mechanisms 
that underlie general topological properties like scale- free 
nature and small- world phenomenon, others are proposed 
to reproduce structural features of specific networks, such 
as the Internet, the World Wide Web, co-authorship net- 
works, food webs, protein-protein interacting networks, 
metabolic networks, and so on. Besides the prosperity, 
we arc worrying that there is no unified method to evalu- 
ate the performance of different models, even if the target 
network is given beforehand. 

Instead of considering many structural metrics, this 
paper reports an evaluating method based on likelihood 
analysis, with an assumption that a better model will as- 
sign a higher likelihood to the observed structure. We 



have tested our method on the real Internet at the AS 
level, and the results suggest that the GLP model out- 
performs the Tang model, and both models are better 
than the BA and ER models. This method can be fur- 
ther applied in determining the optimal parameters of 
network models, and the experiment indicates that the 
parameters obtained by our method can better capture 
the structural characters of newly-added nodes and links. 

The main contributions of this work are twofold. In 
the methodology aspect, we provide a starting point to- 
wards a unified way to evaluate network models. In the 
perspective aspect, we believe for majority of real evolu- 
tionary networks, the driven factors and the parameters 
will vary in time. For example, recent empirical analysis 
suggests that before and after the year 2004, the Inter- 
net at the AS level grows with different mechanisms [16]. 
To find out a single mechanisms that drives a network 
from a little baby to a giant may be an infeasible task. 
In fact, in different stages, a network could grow in dif- 
ferent ways, or in a hybrid matter with changing weight 
distribution on several mechanisms. Once, the research 
focus has shifted from analyzing static models to evolu- 
tionary models. In the near future, it may shift from 
the evolutionary models to the evolving of the evolution- 
ary models themselves. In principle, the current method 
could capture the tracks of not only the network evolu- 
tion, but also the mechanism evolution. Hopefully this 
work could provide some insights into the studies on net- 
work modeling. 
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